61 research outputs found

    Identifying topics of interest of Mendeley users using the text mining and overlay visualization functionality of VOSviewer

    Get PDF
    This paper presents the results of a study in which we have analysed the topics of interest of Mendeley users (i.e. Students, PhDs, Post Docs, Researchers, Professors, Librarians, Lecturers & other Professionals) using text mining and visualization techniques. Beside analyzing topics of interest of Mendeley users, we have also identified fields of science for which readership information can be an interesting source of information complementary to citation information. For this purpose, we have used WoS citation data and Mendeley readership data for a set of 980,698 WoS publications (articles and reviews) with a DOI from 20111.The VOSviewer software tool (Van Eck & Waltman, 2010) was used to create so-called overlay visualizations. These visualizations show additional information on top of a base map. Two types of base maps were used. A base map containing the 250 WoS subject categories was used to analyze differences in readership activity across research fields and to analyze differences in interest between types of users. Base maps containing terms extracted from titles and abstracts using the text mining functionality of VOSviewer (Van Eck & Waltman, 2011) were used to analyze differences in readership activity within research fields. Merit, Expertise and Measuremen

    Analyzing the activities of visitors of the Leiden Ranking website

    Get PDF
    Purpose: To get a better understanding of the way in which university rankings are used.Design/methodology/approach: Detailed analysis of the activities of visitors of the website of the CWTS Leiden Ranking.Findings: Visitors of the Leiden Ranking website originate disproportionally from specific countries (regions). They are more interested in impact indicators than in collaboration indicators, while they are about equally interested in size-dependent indicators and size-independent indicators. Many visitors do not seem to realize that they should decide themselves which criterion they consider most appropriate for ranking universities.Research limitations: The analysis is restricted to the website of a single university ranking. Moreover, the analysis does not provide any detailed insights into the motivations of visitors of university ranking websites.Practical implications: The Leiden Ranking website may need to be improved in order to make more clear to visitors that they should decide themselves which criterion they want to use for ranking universities.Originality/value: This is the first analysis of the activities of visitors of a university ranking website.Merit, Expertise and Measuremen

    A principled methodology for comparing relatedness measures for clustering publications

    Get PDF
    There are many different relatedness measures, based for instance on citation relations or textual similarity, that can be used to cluster scientific publications. We propose a principled methodology for evaluating the accuracy of clustering solutions obtained using these relatedness measures. We formally show that the proposed methodology has an important consistency property. The empirical analyses that we present are based on publications in the fields of cell biology, condensed matter physics, and economics. Using the BM25 text-based relatedness measure as the evaluation criterion, we find that bibliographic coupling relations yield more accurate clustering solutions than direct citation relations and cocitation relations. The so-called extended direct citation approach performs similarly to or slightly better than bibliographic coupling in terms of the accuracy of the resulting clustering solutions. The other way around, using a citation-based relatedness measure as evaluation criterion, BM25 turns out to yield more accurate clustering solutions than other text-based relatedness measures.Merit, Expertise and Measuremen

    Characterizing in-text citations in scientific articles: A large-scale analysis.

    Get PDF
    We report characteristics of in-text citations in over five million full text articles from two large databases – the PubMed Central Open Access subset and Elsevier journals – as functions of time, textual progression, and scientific field. The purpose of this study is to understand the characteristics of in-text citations in a detailed way prior to pursuing other studies focused on answering more substantive research questions. As such, we have analyzed in-text citations in several ways and report many findings here. Perhaps most significantly, we find that there are large field-level differences that are reflected in position within the text, citation interval (or reference age), and citation counts of references. In general, the fields of Biomedical and Health Sciences, Life and Earth Sciences, and Physical Sciences and Engineering have similar reference distributions, although they vary in their specifics. The two remaining fields, Mathematics and Computer Science and Social Science and Humanities, have different reference distributions from the other three fields and between themselves. We also show that in all fields the numbers of sentences, references, and in-text mentions per article have increased over time, and that there are field-level and temporal differences in the numbers of in-text mentions per reference. A final finding is that references mentioned only once tend to be much more highly cited than those mentioned multiple times.Merit, Expertise and Measuremen

    Análisis de co-palabras aplicado a los artículos muy citados en Biblioteconomía y Ciencias de la Información (2007-2017)

    Get PDF
    Se identifican las relaciones entre los conceptos y las áreas temáticas principales dentro de la categoría Biblioteconomía y Ciencias de la Información de Web of Science, en el periodo 2007-2017, utilizando la herramienta analítica “Essential Science Indicators”. Partiendo de los artículos altamente citados, la metodología consistió en la aplicación de análisis de co-palabras así como técnicas estadísticas de análisis multivariante y visualización a través de un mapa de la ciencia. Los resultados principales mostraron que las áreas de mayor interés para los investigadores fueron los estudios sobre la Web 2.0 basados en la participación colaborativa de los usuarios, la evaluación de las actividades científica, las métricas alternativas, o Altmetrics, desarrolladas en las plataformas sociales y académicas, la seguridad y confianza en los entornos virtuales y, por último, la aplicación de plataformas digitales en el comercio electrónicoThis paper aims to identify the conceptual structure in the category Library and Information Sciences in the Web of Science, in the period 2007-2017, using the analytical tool Essential Science Indicators. Based on highly cited papers, the methodology consisted in the application of co-word analysis and multivariate analysis techniques and visualization through science mapping. The main results showed that the studies on Web 2.0 based on the collaborative participation of the users, the evaluation of scientific activities, as well as the alternative metrics developed in the social and academic platforms, such as Altmetrics, trust in virtual environments, and the application of information technologies in companies and digital e-commerce platforms were the areas of greatest interest to the researchers

    Topic identification challenge

    Get PDF
    Merit, Expertise and Measuremen

    Yet Another Ranking Function for Automatic Multiword Term Extraction

    Get PDF
    International audienceTerm extraction is an essential task in domain knowledge acquisition. We propose two new measures to extract multiword terms from a domain-specific text. The first measure is both linguistic and statistical based. The second measure is graph-based, allowing assessment of the importance of a multiword term of a domain. Existing measures often solve some problems related (but not completely) to term extraction, e.g., noise, silence, low frequency, large-corpora, complexity of the multiword term extraction process. Instead, we focus on managing the entire set of problems, e.g., detecting rare terms and overcoming the low frequency issue. We show that the two proposed measures outperform precision results previously reported for automatic multiword extraction by comparing them with the state-of-the-art reference measures

    Identifying diachronic topic-based research communities by clustering shared research trajectories

    Get PDF
    Communities of academic authors are usually identified by means of standard community detection algorithms, which exploit ‘static’ relations, such as co-authorship or citation networks. In contrast with these approaches, here we focus on diachronic topic-based communities –i.e., communities of people who appear to work on semantically related topics at the same time. These communities are interesting because their analysis allows us to make sense of the dynamics of the research world –e.g., migration of researchers from one topic to another, new communities being spawn by older ones, communities splitting, merging, ceasing to exist, etc. To this purpose, we are interested in developing clustering methods that are able to handle correctly the dynamic aspects of topic-based community formation, prioritizing the relationship between researchers who appear to follow the same research trajectories. We thus present a novel approach called Temporal Semantic Topic-Based Clustering (TST), which exploits a novel metric for clustering researchers according to their research trajectories, defined as distributions of semantic topics over time. The approach has been evaluated through an empirical study involving 25 experts from the Semantic Web and Human-Computer Interaction areas. The evaluation shows that TST exhibits a performance comparable to the one achieved by human experts
    corecore